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Abstract 

According to many phenomenological and theoretical studies the dis- 
tribution of family name frequencies in a population can be asymptotically 
described by a power law. We show that the Galton- Watson process cor- 
responding to the dynamics of a growing population can be represented 
in Hilbert space, and its time evolution may be analyzed by renormaliza- 
tion group techniques, thus explaining the origin of the power law and 
establishing the connection between its exponent and the ratio between 
the population growth and the name production rates. 

1 Introduction 

The frequency distribution of family names in local communities, regions and 
whole countries has been the object of a sustained interest by geneticists and 
statisticians for more than thirty years, starting from the seminal paper by Ya- 
suda et al. [T]. For a recent review of the relevant literature we refer to Colan- 
tonio et al. [2], while Scapoli et al. [3] have recently collected and synthesized 
their results on the major countries of continental Western Europe. The main 
motivation for these researches resides in the deep analogy existing between 
surname distributions and the frequency of neutral alleles in a population: both 
distributions are generated by an evolutionary branching process subject to mu- 
tation and migration but not conditioned by natural selection. In particular it 
has been observed that the dynamics of family names, in countries with an 
European family name system, mimics that of the Y chromosome [4]. Models 
for such processes have been advanced in the genetic and statistical literature, 
starting from the Karlin-McGregor [5^ statistical theory of neutral mutations. A 
significant theoretical evolution occurred in particular after Lasker's empirical 
observation |6j that a power law could offer a good fit of the observed surname 
distributions. As a consequence Panaretos [7J suggested the use of the Yule- 
Simon distribution, while Consul ,8, proposed to employ the Geeta distribution 
with motivations coming from a branching process modelization. Evolutionary 
processes have attracted also the attention of physicists, who have found that 
neutral evolution might be a ground for application of many techniques proper 
of statistical mechanics |S] (TU] [H] • In particular Miyazima et al. [H] , studying 
family name distributions in Japanese towns, found the systematic emergence 
of scaling laws, and further theoretical studies [13] [S] justified the appearance 
of power laws of the Yule-Simon type in the case of growing populations with 



non vanishing probability for mutations. A different explanation was offered 
by Reed and Hughes |15| who considered a branching process with mutation 
and migration and found that the asymptotic form of the distributions should 
follow a power law. The most recent and comprehensive result is due to the 
Korean group of Back et al. [16j [17], who wrote down a master equation for 
the frequency distribution of family names and its time evolution in the presence 
of birth, death, mutation and migration, and found the possibility of different 
power laws with exponents depending on the mutation and migration parame- 
ters. In the present paper we reconsider the models of family name evolution 
in the context of a Hilbert space representation of branching processes, and 
show that distributions characterized by an asymptotic power law behaviour 
can be obtained as solutions of recursive equations which would correspond 
to the renormalization group equations of an (equivalent) physical system. In 
Sec. [5] we introduce and motivate our models. In SeclSlwe discuss the simpler 
case of a system characterized by pure immigration without mutations. Finally 
in SecHlwe discuss the case with mutation. In Appendix \K\ we represent the 
Galton- Watson branching process in a Hilbert space. 

2 The models 

In the following sections we shall introduce two models, that take care of two 
different ways of generating new family names in a population: immigration 
from abroad and mutation occurring after reproduction. The importance of 
the appearance of new family names was pointed out in Refs. [HI O fTC] . 
The analogy of the recursive equations we shall obtain with those typically 
derived by a renormalization-group approach to a physical system will allow 
us to evaluate the asymptotic behavior of the family name distribution N(k), 
where N is the number of family names represented by exactly k individuals. 
Obviously in a typical real situation both immigration and mutation contribute 
to the dynamics of the family name distribution. But in our models we shall 
first focus on a population in which only immigration occurs, and then on one in 
which only mutation occurs. This simplification is justified by the fact that in an 
exponentially growing population (an approximation usually called Malthusian 
law) the effect of immigration can be neglected in comparison to mutation, at 
least in order to study the asymptotic behavior. However, in peculiar historical 
conditions, mutations can be heavily depleted and as a consequence the study of 
a society where name change is only due to immigration retains its value. Since 
we are interested in the family name distribution we can limit our attention to 
the male individuals, which is consistent with the legislation on names present in 
most real societies. In the following we shall use the term "individual" referring 
just to males. Moreover we shall suppose that the evolution of the population 
can be described by the Galton- Watson model. This means we shall consider: 

• time as discrete, moving from one generation to the next; 

• the system as completely markovian; 

• each individual as independent of all others. 

The last hypothesis may be considered a very strong restriction if applied to 
a biological system, since, for example, the exhaustion of resources induces 
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a collective behaviour, limiting the growing rate. But we can consider this 
hypothesis to be valid in the context of exponential grow of a population. It is 
useful to fix some definitions in the use of the Galton- Watson process. We set: 

Pn — probability for an individual to have n sons (1) 

It is straightforward to introduce the generating function of the Galton- Watson 
process: 

oo 

/(z) = ^p„z" (2) 
■11=1 

Our hypothesis of growing population forces us to take p„ such that the mean 
number of sons is greater than one: 

oo 

npn = /'(I) = TO > 1 

n=l 

We will exclude the trivial case: p„ = (5i„. We omit the explicit derivation of the 
recursive equations, which can be found with details in Appendix [X] However, 
their meaning will be somehow intuitive. 



3 Immigration 

We want to analyze a population whose members increase in number by the 
Galton- Watson mechanism and furthermore a group of individuals comes from 
outside. Each son inherits his family name from his father, while the new 
individuals coming from outside bring new family names. We are interested in 
the asymptotic behaviour of iV(fc, t), which corresponds to the number of family 
names represented by k individuals at time t. The values N{k,0) = No{k) are 
assigned as initial conditions of the problem, with: 

DO OO 

No{k) 5o < oo kNo{k) ^ Nq < oo (3) 

k=l k=l 

where 5*0 is the initial number of family names and A^o is the initial number of 
individuals. We introduce the generating function: 

oo 
fe=0 

Now we suppose that the individuals from outside come always distributed in 
the same manner: 9{k) is the number of new family names represented by k 
individuals among them. We suppose the number of individuals Oq and the 
number of new family names Go to be finite: 

^0 = EkOik) 
Go = EkkOik) 

As before we introduce the generating function: 
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We can obtain a recursive equation for nt{z) involving 0{z). The explicit deriva- 
tion is given in Appendix 1X1 



nt+,{z)=nt{f{z)) + e{z) (5) 
A formal solution is given by: 

t-1 

nt{z)^n,{ft{z)) + J20{fk{z)) 

k=0 

where fk{z) indicates the function f(z) iterated fc-times. From this expression 
it is easy to compute the mean number of individuals Nt and the mean number 
of family names St at time t: 



t-1 

k 



_ m 

k=0 k=0 



-[No + -^)n.'--^ (6) 



m — 1 / m — 1 

5t=nt(l) = 5o + i0o (7) 

We are interested in the limit i — > oo and in the asymptotic behaviour: fc 3> 1. 
In order to achieve this goal, we notice that Eq.® is formally analogous to 
the equations coming from the renormalization group approach, linking the 
system at two different degrees of magnification. Therefore the system can be 
studied by using this analogy with the corresponding physical system. More 
explicitly, suppose $„(T) is the free energy of a hierarchical model, at scale 
n and temperature T. With standard renormalization group method, we can 
obtain the recursive equation linking two different scales (see |18)): 

$„+i(r) = g(r) + i<i>„(0(r)) (8) 

where g{T) is a regular function that comes up after summing the degree of 
freedom of the smaller scale and (j>{T) is the RG flow. Then near the critical 
point, for large n: 

Eq.(IS]) is formally analogous to Eq.® and in our case the role of the flow is 
carried out by the Galton- Watson generating function f{z) and so the phases 
and the critical points correspond to the fixed points of ,f(z): 

f{z) = z (10) 

From the fact that /(z) is convex and /'(I) > 1, we find that Eq. lfTUl) has three 
solutions: q, 1, oca . From the Galton- Watson theory we know that q G [0,1) is 



-"^more precisely these are the possible outcomes of: lim„^oo fn{zo) for different values of 

ZQ. 
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the extinction probability. Moreover it is easy to see that f'{q) < 1. In fact if 
it was f'iq) > 1 one would have by convexity: 

/(l)>/(g) + /'(g)(l-<z)>l 

So we have that q, oo are attractive, while 1 is a repulsive fixed point which 
separates the two stable phases. We get a critical behaviour near 1: 

n{z) = lim nt{z) ~ (1 — z)" 

t — *oo 

One can see that in this case we have that for t ^ 1: 

iV(fc,i) ~ (11) 

To compute a, we take /x = 1, Tc = 1, m = /'(I) = (f>{Tc) in Eq.® and 
we notice we are in an atypical situation in which a = 0. It means that the 
function is diverging more slowly than any power and it is easy to see that it is 
logarithmic. In fact using Eq.([6]) and ((T]): 

A = lim n'{z)m "o — lim lim n[{z)m "o = 

^ti^) f Cj \ '-'0 

lim lim ri^{z)ra "o — Nq H ) m ''o (12) 



t^oo z^i y m — 1 ^ 

So we get near 1 

n'{z) ^ (^No + ^) = Ae""^^^ 

where we set b = . It can be solvec^l giving 

n(z)^-i(log(A&)+log(l-z)) 

which ensures us the logarithmic divergence and implies for large k: 

C 

N{k) = lim N{k, t)^ — {1 + o(l)) 

So for immigrations wc find a power-law behaviour with exponent —1. Notice 
that this behaviour is completely independent of the initial condition and of the 
distribution of the immigrating family names at each generation. 



4 Mutation 

The context is analogous to the previous one but we do no longer have immigra- 
tion. We use again the initial condition in Eq.(l3|). Now, each son has a certain 
probability p that his family name mutates into a new one, different from his 
father's. We suppose that p does not depend on the family and we neglect the 
case in which two or more sons take the same new family name. This means the 
Galton- Watson contribution is modified since only a part proportional to 1 — p 

^the arbitrary constant can be fixed by imposing the solution diverges in 1 
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of the offspring holds the same family name and the remaining part is added to 
the families of size 1. This implies the equation: 



nt+i 



iz)^nt{f{z'-'>))+pmn',{l)z (13) 



where we used the fact that n^{l) equals the total number of individuals at 
generation t. Observe that mutations do not contribute to the total number of 
individuals and so: 



as it can be shown directly via Eq. (jl3p . The recursive equation can now be 
solved, at least formally. Defining r(z) = / (z^"'') and indicating by rfe(z) the 
function r(z) iterated fc-times, we get the solution: 

t-i 

nt{z) = no{rt{z)) + pNo ^ m*"'V„(z) > pNom^ro{z) 

n=0 

The last inequality shows that no limit in t can exist. However we can obtain a 
limit for the function: 

774(2) = nt{z)m^*- 

Since for large t: nt(l) oc m*, as one can check by putting z = 1 in Eq. p3p . 
we are basically considering the distribution normalized to the total number of 
families. So we can put Eq. (|13|) in the form: 

.*.!(.) = piVoz + ^^ (14) 
m 

which is again in the form of Eq.®. However the flow is slightly changed with 
respect to the Galton- Watson generating function. We have ^'(1) = (1 — p)ra 
and we suppose p small enough for 1 to be a repulsive fixed point for the flow. 
In this case we must have a critical behaviour near 1, whose exponent can be 
evaluated using Eq.([5|): 

77(z) = lim rjt{z) ~ (1 - z)" 

t — ^00 

where the exponent can be obtained using Eq.®: 

ln(m) ln(m) 



a 



ln(r'(l)) In(TO) + ln(l - p) 
Using Eq. (jlip we get the exponent of the family name power- law distribution: 

7 = a+ l = Z— ; — ^ ; — ; ~ 2 — 



ln(m) + ln(l — p) ln(m) 

where we considered p very small as it is true in the real situations (see [16j). 
Again the behaviour is completely independent of the initial condition and shows 
the typical features of a scale-free system. 
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5 Conclusion 



In this paper we represented the Galton- Watson process as a quantum evolu- 
tion defining the Hilbert space and the time evohition operator corresponding to 
the Galton- Watson probabilities. In this way we obtained two recursive equa- 
tions for two possible models with different family name production mechanism: 
immigrations and mutations. The structure of the branching allowed us to in- 
terpret these equations as the ones that connect different scales of a physical 
system and, in particular, the asymptotic behaviour corresponds to the power 
law emerging near the critical point. The exponents are consistent with those 
evaluated in with a master equation approach: N{k) goes as for a 
society where name change is only due to immigration and, approximately, as 
for a society where family name mutation occurs. Our method shows the 
robustness of this results, which are independent of the offspring distribution. 
Possible extensions of the model remain to be investigated and will be the object 
of future studies. 
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A The Galton- Watson process in an Hilbert space 



The structure of branching process that characterizes the Galton- Watson allows 
us to consider the reproduction governed by chance as a decay process whose 
interaction is given by an hamiltonian, which as we will see, is not hermitian. 
We first introduce the creation and destruction operators at each time with the 
usual commutation rules: 



[ak,ah] = (15) 

[4.4] = (16) 
[ak,al} = 6kh (17) 

where, respectively, aj, creates and destroys an individual at time k. The 
Hilbert space is obtained in the usual way, acting on the vacuum Fock state 
with polynomials in a] for all possible values of t. A basis for the space is then 
given by the following set: 

\n,t) = {alr\0) {n,t\ = {0\{a,r (18) 

Then, at each time t, the state of the system, which is determined by the 
probability bk(t) that exactly k individuals are present, can be written: 

k 

It must be possible to connect the dynamics to the parameters p„ introduced 
in Eq.([T]) and so to the generating function f{z) of Eq.([2]). This can be done 
setting the hamiltonian as: 

H{t) = /(4+i)a* (19) 

we can write the time-evolution operator: 

U{t) EE cxp{H{t)) (20) 

which is the one-time-step evolution operator: it evolves the states at time t 
to time i -f ill, giving the correct probabilities according to the Galton- Watson 
process. In fact, one can easily check that: 

Um,t) = U{t)al\0) = /(4+i)|0) ^ 5]p„(4)"|0) (21) 



^ It should be observed that the correct expression for U{t) should be: 

U(t) = Pte"'^*^ 

where Pt destroys all the states at timet: Pt\0) = |0), Pt(a|)"|0) = and Vh ^ t [Pt,al] = 0. 
In this way we eliminate all the parts of the states that do not evolve to time t + 1. E.g.: 

\ (at^,) a. + (^alm = (2/ (a,V,) a] + / (a,V,)') |0> 

and the operator Pt then eliminates the first term in the parenthesis giving the correct result. 
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And in general, by linearity we know that given a state with a particular 

probability distribution we get the state at time t+1 correctly evolved. Starting 
from a state |0o) at time we can obtain the state at time T by: 

|0t) = W(T)|0o) = U{T - 1)U{T - 2) • • • (7(0)100) (22) 

and this equation defines the full time-evolution operator. We see now how to 
derive Eq.Q. We want to write the equation of evolution for: 

oo 

|n(t)) =^iV(fc,i)|fc,i) 

A:=0 

With the notation of section O we define the state: 

oo 

\e{t)) ^Y.(^{k)\k,t) 

In the absence of immigration the evolution would be simply given by Eq. (|22p . 
But here at each step, the number of family names represented by k individuals 
grows due to the individuals coming from outside. So we get the equation: 

\n{t+l)) = U{t)\n{t)) + \6{t)) (23) 

Now we use the map W from the Hilbert space to C°°[0, 1] defined on the basis 
of Eq.lHl) as: 

\n,t) = {a\r\0) ^ (24) 

And in general: yV(|0(i))) — 4){zt) G C°°[0, 1]. The action of an operator 
becomes an integral transformation. For U(t) we have a simple kernel of the 
form: U{t) U{zt,zt+i) = 5{zt — f{zt+i)) as can be deduced from Eq. ([2T|) . 
Then: 

0t+i(zt+i) = W(C/(t)|0(O)) = J Uizt,zt + l)(j)izt,t)dz = (bt{f{zt+i)) 
where 0t(z) = yV(|0(i)}). Acting on the Eq.lHH]) with the map W we get Eq.®. 
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